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For the last four decades, students’ 
scores on standardized tests have 
increasingly been regarded as the 
most meaningful evidence for evalu- 
ating U.S. schools. Most Americans, 
indeed, believe students’ standardized- 
test performances are the only legitimate 
indicator of a school’s instructional 
effectiveness. Yet, although test-based evaluations of 
schools seem to occur almost as often as fire drills, in 
most instances these evaluations are inaccurate. That’s 
because the standardized tests employed are flat-out 

wrong. 

Standardized 
tests have been 
used to evaluate 
America’s schools 
since 1965, when 
the U.S. Elementary 
and Secondary 

Education Act (ESEA) became law. That statute provided 
for the first major infusion of federal funds into local 
schools and required educators to produce test-based 
evidence that ESEA dollars were well spent. 

But how, you might ask, could a practice that’s been 
so prevalent for so long be mistaken? Just think back 
to the many years we forced airline attendants and 
nonsmokers to suck in secondhand toxins because 
smoking on airliners was prohibited only during takeoff 
and landing. Some screwups can linger for a long time. 

But mistakes, even ones we’ve lived with for decades, 
can often be corrected once they’ve been identified, 
and that’s what we must do to halt today’s wrongheaded 
school evaluations. If enough educators 
— and noneducators — realize that there 
are serious flaws in the way we evaluate 

| _ ______ our schools, and that those flaws erode 

F O R S S E S S ]Vt E N T educational quality, there ’s a chance we 

can stop this absurdity. 

By W James Popham 




Instructionally Insensitive 

First, some definitions. 

A standardized test is any test that's admin- 
istered, scored, and interpreted in a standard, 
predetermined manner. Standardized aptitude 
tests are designed to make predictions about 
how a test taker will perform in a subsequent 
setting. For example, the SAT and ACT are 
used to predict the grades that high school 
students will earn when they get to college. In 
contrast, standardized achievement tests indi- 
cate how well a test taker has acquired know- 
ledge and mastered certain skills. 

Although students’ scores on standardized 
aptitude tests are sometimes unwisely stirred 
into the school-evaluation stew, scores on 
standardized achievement tests are typically 
the ones used to judge a school’s success. 
Two kinds of standardized achievement tests 
commonly used for school evaluations are ill 
suited for that measurement. 

The first of these categories are nationally 
standardized achievement tests like the Iowa 
Tests of Basic Skills, which employ a compar- 
ative measurement strategy. The fundamental 
purpose of all such tests is to compare a stu- 
dent’s score with the scores earned by a pre- 
vious group of test takers (known as the 



correctly by about half of the test takers. If an 
item is answered correctly more often by stu- 
dents at the upper end of the socioeconomic 
scale than by lower-SES kids, that question 
will provide plenty of score-spread. After all, 
SES is a delightfully spread-out variable and 
one that isn’t quickly altered. As a result, in 
today’s nationally standardized achievement 
tests, there are many SES-linked items. 

Unfortunately, this kind of test tends to 
measure not what students have been taught 
in school but what they bring to school.That’s 
the reason there’s such a strong relationship 
between a school’s standardized-test scores 
and the economic and social makeup of that 
school’s student body. As a consequence, most 
nationally standardized achievement tests end 
up being instructionally insensitive. That is, 
they’re unable to detect improved instruction 
in a school even when it has definitely taken 
place. Because of this insensitivity, when stu- 
dents’ scores on such tests are used to evalu- 
ate a school’s instructional performance, that 
evaluation usually misses the mark. 

A second kind of instructionally insensi- 
tive test is the sort of standardized achieve- 
ment test that has been developed for 
accountability by many states during the past 
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“norm group”). It can then be determined if 
Johnny scored at the 95th percentile on a 
given test (attaboy!) or at the 10th percentile 
(son, we have a problem). 

Because of the need for nationally standard- 
ized achievement tests to provide fine-grained, 
percentile-by-percentile comparisons, it is 
imperative that these tests produce a consid- 
erable degree of score-spread — in other 
words, plenty of differences among test tak- 
ers’ scores. So producing score-spread often 
preoccupies those who construct standard- 
ized achievement tests. 

Statistically, a question that creates the 
most score-spread on standardized achieve- 
ment tests is one that only about half the stu- 
dents answer correctly. Over the years, devel- 
opers of standardized achievement tests have 
learned that if they can link students’ success 
on a question to students’ socioeconomic sta- 
tus (SES), then that item is usually answered 



two decades. Such tests were typically created 
to better assess students’ mastery of the offi- 
cially approved skills and knowledge. Those 
skills and knowledge, sometimes referred to 
as goals or curricular aims, are usually known 
these days as content standards. Thus, such 
state-developed standardized assessments — 
like the Florida Comprehensive Assessment 
Test (FCAT) — are frequently described as 
“standards-based” tests. 

Because these customized standards-based 
tests were designed (almost always with the 
assistance of an external test-development 
contractor) to be aligned with a state’s cur- 
ricular aspirations, it would seem that they 
would be ideal for appraising a school’s qual- 
ity. Unfortunately, that’s not the way it works 
out. When a state’s education officials decide 
to identify the skills and knowledge that stu- 
dents should master, the typical procedure for 
doing so hinges on the recommendations of 



subject-matter specialists from that state. 
For example, if authorities in Ohio or New 
Mexico want to identify their state’s offi- 
cial content standards for mathematics, then 
a group of, say, 30 math teachers, math- 
curriculum consultants, and university math 
professors are invited to form a statewide 
content-standards committee. Typically, when 
these committees attempt to identify the 
skills and knowledge the students should mas- 
ter, their recommendation — not surpris- 
ingly — is that students should master every- 
thing. These committees seem bent on identi- 
fying skills that they fervently wish students 
would possess. Regrettably, the resultant lita- 
nies of committee-chosen content standards 
tend to resemble curricular wish lists rather 
than realistic targets. 

Whether or not the targets make sense, 
there tend to be a lot of them, and the effect 
is counterproductive. A state’s standards- 
based tests are intended to evaluate schools 
based on students’ test performances, but 
teachers soon become overwhelmed by too 
many targets. Educators must guess about 
which of this multitude of content standards 
will actually be assessed on a given year’s test. 
Moreover, because there are so many content 
standards to be assessed and only limited test- 
ing time, it is impossible to report any mean- 
ingful results about which content standards 
have and haven’t been mastered. 

After working with standards-based tests 
aimed at so many targets, teachers under- 
standably may devote less and less attention 
to those tests. As a consequence, students’ per- 
formances on this type of instructionally 
insensitive test often become dependent 
upon the very same SES factors that com- 
promise the utility of nationally standard- 
ized achievement tests when used for 
school evaluation. 

Wrong Tests, 

Wrong Consequences 

Bad things happen when schools are evalu- 
ated using either of these two types of 
instructionally insensitive tests. This is partic- 
ularly true when the importance of a school 
evaluation is substantial, as it is now. All of the 
nation’s public schools are evaluated annu- 
ally under the provisions of the federal No 
Child Left Behind Act (NCLB). Not only are 
the results of the NCLB school-by-school 
evaluations widely disseminated, there are 
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also penalties for schools that receive NCLB 
funds yet fail to make sufficient test-based 
progress. These schools are placed on an 
improvement track that can soon “improve” 
them into nonexistence. Educators in 
America’s public schools obviously are under 
tremendous pressure to improve their stu- 
dents’ scores on whatever NCLB tests their 
state has chosen. 

With few exceptions, however, the assess- 
ments that states have chosen to implement 
because of NCLB are either nationally 
standardized achievement tests or state- 
developed standards-based tests — both of 
which are flawed. Here, then, are three 
adverse classroom consequences seen in 
states where instructionally insensitive 
NCLB tests are used: 

• Curricular reductionism. 

In an effort to boost their students’ NCLB test 
scores, many teachers jettison curricular con- 
tent that — albeit important — is not apt to be 
covered on an upcoming test. As a result, stu- 
dents end up educationally shortchanged. 

• Excessive drilling. 

Because it is essentially impossible to raise 
students’ scores on instructionally insensitive 
tests, many teachers — in desperation — 
require seemingly endless practice with items 
similar to those on an approaching accounta- 
bility test. This dreary drilling often stamps 
out any genuine joy students might (and 
should) experience while they learn. 

• Modeled dishonesty. 

Some teachers, frustrated by being asked to 
raise scores on tests deliberately designed to 
preclude such score raising, may be tempted 
to adopt unethical practices during the 
administration or scoring of accountability 
tests. Students learn that whenever the stakes 
are high enough, the teacher thinks it’s OK to 
cheat. This is a lesson that should never be 
taught. 

These three negative consequences of 
using instructionally insensitive standard- 
ized tests as measuring tools, taken together, 
make it clear that today’s widespread 
method of judging schools does more than 
lead to invalid evaluations. Beyond that, such 
tests can dramatically lower the quality of 
education. 



An Antidote 

Is it possible to build accountability tests that 
both supply accurate evidence of school qual- 
ity and promote instructional improvement? 
The answer is an emphatic yes. In 2001, prior 
to the enactment of NCLB, an independent 
national study group, the Commission on 
Instructionally Supportive Assessment, identi- 
fied three attributes that an “instructionally 
supportive” accountability test must possess: 

• A modest number of supersignificant 
curricular aims. 

To avoid overwhelming teachers and students 
with daunting lists of curricular targets, an 
instructionally supportive accountability test 



clearly, is how to replace today’s instruction- 
ally insensitive accountability tests with bet- 
ter ones. Fortunately, at least one state, 
Wyoming, is now creating its own instruc- 
tionally supportive NCLB tests. More states 
should do so. 

What You Can Do 

If you want to be part of the solution to this 
situation, it’s imperative to learn all you can 
about educational testing. Then learn some 
more. For all its importance, educational test- 
ing really isn’t particularly complicated, 
because its fundamentals consist of common- 
sense ideas, not numerical obscurities. You’ll 
not only understand better what’s going on 
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should measure students’ mastery of only an 
intellectually manageable number of curricu- 
lar aims, more like a half-dozen than the 50 or 
so that a teacher may encounter today. 
However, because fewer curricular bench- 
marks are to be measured, they must be truly 
significant. 

• Lucid descriptions of aims. 

An instructionally helpful test must be 
accompanied by clear, concise, and teacher- 
palatable descriptions of each curricular aim 
to be assessed. With clear descriptions, teach- 
ers can direct their instruction toward pro- 
moting students’ mastery of skills and knowl- 
edge rather than toward getting students to 
come up with correct answers to particular 
test items. 

• Instructionally useful reports. 

Because an accountability test that supports 
teaching is focused on only a very limited 
number of challenging curricular aims, a stu- 
dent’s mastery of each subject can be mean- 
ingfully measured, letting teachers determine 
how effective their instruction has been. 
Students and their parents can also benefit 
from such informative reports. 

These three features can produce an 
instructionally supportive accountability test 
that will accurately evaluate schools and 
improve instruction. The challenge before us, 



in the current mismeasurement of school 
quality, you’ll also be able to explain it to oth- 
ers. And those “others,” ideally, will be school 
board members, legislators, and concerned 
citizens who might, in turn, make a differ- 
ence. Simply hop on the Internet or head to 
your local library and hunt down an introduc- 
tory book or two about educational assess- 
ment. (I’ve written several such books that, 
though not as engaging as a crackling good 
spy thriller, really aren’t intimidating.) 

With a better understanding of why it is so 
inane — and destructive — to evaluate schools 
using students’ scores on the wrong species 
of standardized tests, you can persuade any- 
one who’ll listen that policy makers need to 
make better choices. Our 40-year saga of 
unsound school evaluation needs to end. 
Now.© 

W. James Popham, who began his career in 
education as a high school teacher in Oregon, 
is professor emeritus at the University of California- 
Los Angeles School of Education and Information 
Studies. Author of 25 books, he is a former presi- 
dent of the American Educational Research Asso- 
ciation. Write to letters@edutopia.org. 



Take the next step toward a better understand- 
ing of assessment by visiting the Edutopia Web 
site, where you’ll find articles and documentaries 
on alternative forms of assessment, interviews 
and opinion pieces by experts in the field, and 
a wealth of useful and informative resources, 
including an instructional module on building 
an evidence-based assessment. 
www.edutopia.org/assessment 
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